---
sidebar_position: 0
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Configuration ⚙️

To run re_data you would need to configure what tables should be monitored and set up some properties of this monitoring. You may also want/need to update some of the defaults vars use by re_data to run it for specific time windows or compute types of metrics you need.

## Model level config

Currently re_data supports dbt native configuration, by leveraging dbt models & sources configs.

If you are not familiar with dbt models & sources configuration we encourage to check the dbt: [model-configs](https://docs.getdbt.com/reference/model-configs) and [source-configs](https://docs.getdbt.com/reference/source-configs) documentation.

re_data dbt native cofiguration follows the same rules as dbt configuration, config block inside model will have the most priority and configuration in `dbt_project.yml` will have the least priority.

<Tabs
  defaultValue="config_block"
  values={[
    {label: 'Config block', value: 'config_block'},
    {label: 'Property file', value: 'property_file'},
    {label: 'Project file', value: 'project_file'},
  ]}>

  <TabItem value="config_block">

```sql title="<model_name>.sql"
{{
    config(
      re_data_monitored=true,
      re_data_time_filter='creation_time',
      re_data_columns=['amount', 'status'],
      re_data_metrics_groups=['table_metrics', 'column_metrics'],
      re_data_metrics={'table': ['orders_obove_100'], 'column': { 'status': ['distinct_values']}},
      re_data_anomaly_detector={'name': 'modified_z_score', 'threshold': 3.0, 'direction': 'both'},
      re_data_owners=['datateam']
    )
}}


select ...
```
  </TabItem>

  <TabItem value="property_file">

```yml title=schema.yml
version: 2

models:
  - name: pending_orders
    config:
      re_data_monitored: true
      re_data_time_filter: created_at
      re_data_columns:
        - amount
        - status
      re_data_metrics_groups:
        - table_metrics
      re_data_metrics:
        table:
          - orders_above_100
        column:
          status:
            - distinct_values
      re_data_anomaly_detector:
        name: modified_z_score
        threshold: 3
        direction: both
          
          
```
  </TabItem>

  <TabItem value="project_file">

```yml title="dbt_project.yml"
models:
  toy_shop:
    revenue:
      +re_data_monitored: true
      +re_data_time_filter: created_at
      +re_data_anomaly_detector:
        name: modified_z_score
        threshold: 3
        direction: both
      +re_data_metrics_groups:
        - table_metrics

      orders_per_age:
        +re_data_metrics:
          table:
            - orders_above_100

sources:
  toy_shop:
    toy_shop_sources:
      toy_shop_customers:
        +re_data_monitored: true
        +re_data_time_filter: joined_at
  
seeds:
  toy_shop:
    order_items:
      +re_data_monitored: true
      +re_data_time_filter: added_at
      +re_data_anomaly_detector:
        name: z_score
        threshold: 3
      +re_data_columns:
        - name
        - amount

```

  </TabItem>

</Tabs>

:::info
dbt 1.1.0 introduced config for sources so `Property file` configuration for sources in available since dbt 1.1.0
:::

You can define configuration on many levels; it's common to enable re_data for a group of tables (for example, all sources). It's also common to override some of the properties for specific groups.

Now let's go over the configuration you can set on models

## Monitoring properties

### re_data_monitored

Set to `true` to enable monitoring for a given table or set of tables.

### re_data_time_filter (optional)

SQL expression (for example, column name) to filter records of the table to a specific time range. It can be set to `null` if you wish to compute metrics on the whole table. This expression will be compared to `re_data:time_window_start` and `re_data:time_window_end` vars during the run. (described below)

### re_data_columns (optional)

Set of columns for which re_data should compute metrics. If not specified, re_data will compute stats for all columns with either numeric or text types.

### re_data_metrics_groups (optional)

List of groups of metrics to compute. You can use any `re_data:metrics_groups` defined in your vars here.
If not specified, re_data will compute metrics defined by `re_data:default_metrics` variable.

### re_data_metrics (optional)

**Additional** metrics to be computed for a given table (set of tables). Those can be either whole `table` level or `column` level. (Check out **[metrics](/docs/re_data/reference/metrics/overview_metric)** section to learn distinction between the two)

You can be pass any number of already defined or your custom metrics to be computed. Check out **[extra metrics](/docs/re_data/reference/metrics/extra_metrics)** section for available metrics and **[custom metrics](/docs/re_data/reference/metrics/your_own_metric)** for ways to define your own metrics.

In a lot of cases when you extend metrics which are computed we recommend creating a new `re_data:metrics_groups` in your vars,
adding your metrics to it and then defining `re_data_metrics_groups` to use it for set of models. 
This approach is usually more flexible when adding new metrics for a given model.

### re_data_anomaly_detector (optional)

Alternative anomaly dector with it's parameters to use when detecting anomalies in a given model (set of models)

For details about configuration look into [Anomaly Detection](anomaly_detection)

### re_data_owners (optional)

Group of single person which should receive and alert about problem with a given model.

For details about configuration look into [Notifications](notifications/configuring_owners)

## Global config vars

Apart from model specific config re_data enables you to edit global configuration for some of the parameters.
All of them are optional so we start with sensible defaults and let you override if there is a need.

```yaml title="Parameters of re_data configuration"
vars:
  # (optional) if not passed, stats for last day will be computed
  re_data:time_window_start: '{{ (run_started_at - modules.datetime.timedelta(1)).strftime("%Y-%m-%d 00:00:00") }}'
  re_data:time_window_end: '{{ run_started_at.strftime("%Y-%m-%d 00:00:00") }}'
  
  # (optional) configuring 
  re_data:select:
    - model_name1
    - model_name2
    - source_name1

  # (optional) tells how much hisory you want to consider when looking for anomalies
  re_data:anomaly_detection_look_back_days: 30

  # (optional) configuring storing tests history
  re_data:save_test_history: true

  # (optional) querying db for failing rows
  re_data:query_test_failures: true

  # (optional) limit the number of failed rows returned per test
  re_data:test_history_failures_limit: 10

  # (optional) configuring storing table samples
  re_data:store_table_samples: true

  # (optional) configuring owners
  re_data:owners_config:
    datateam:
      - type: slack
        identifier: U02FHBSXXXX
        name: user1
    backend:
      - type: email
        identifier: user1@getre.io
        name: user1

```

### re_data:time_window_start, re_data:time_window_end

re_data metrics are time-based. (re_data filters all your table data to a specific time window.)
In general, we advise setting up a time window this way that all new data is monitored.
It's also possible to compute metrics from overlapping data for example last 7 days.

By default, re_data computes daily stats from the last day (it actually uses exact configuration from example for that)

### re_data:select

This is a list which allows you to additionally restrict re_data to only compute metrics/anomalies for certain models.
Each model listed here still needs to have `re_data_monitored=true` to be monitored.
If the list is not passed, re_data will computed stats for all `re_data_monitored=true` models.

List elements can be either model/source name or dbt tags which your models have. Example select section with tags would look like this:

```yaml
vars:
  ...

  re_data:select:
    - tag:my_tag_name
    - tag:my_other_tag_name
    - specific_table
    - other_specific_table
```

### re_data:metrics_groups

Groups of metrics to compute. By defult `table_metrics` and `column_metrics` are defined here, and that's their definition:

```yaml
re_data:metrics_groups:
  table_metrics:
    table:
      - row_count
      - freshness

  column_metrics:
    column:
      numeric:
        - min
        - max
        - avg
        - stddev
        - variance
        - nulls_count
        - nulls_percent
      text:
        - min_length
        - max_length
        - avg_length
        - nulls_count
        - missing_count
        - nulls_percent
        - missing_percent
```

You can redefine this var any way you want. If you remove `table_metrics`, `column_metrics` group you will then not be able to use them in `re_data_metrics_groups` settings.

### re_data:default_metrics

Default metrics to compute for each model if no `re_data_metrics_groups` is specified. You can use any of the metrics groups defined in `re_data:metrics_groups` here.
The default re_data configuration is a follows:

```yaml
re_data:default_metrics:
  - table_metrics
  - column_metrics
```

### re_data:anomaly_detector

See [Anomaly Detection](anomaly_detection)

### re_data:anomaly_detection_look_back_days

The period which `re_data` considers when looking for anomalies. (By default, it's 30 days)

### re_data:save_test_history

Variable to enable storing test history. See [re_data tests history](/docs/re_data/reference/tests/history) for more details.

### re_data:query_test_failures

Variable to configure if re_data should query failed rows (`true` by default)

### re_data:test_history_failures_limit

Variable to configure how many failured rows to fetch per table (`10` by default)

### re_data:store_table_samples
This is used to enable storing sample data of monitored tables.

### re_data:owners_config

Variable to configure owners for your data.  See [re_data notifications](/docs/re_data/reference/notifications/configuring_owners) for more details.

